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Abstract. Building a repository of proof-checked mathematical knowl- 
\Q ' edge is without any doubt a lot of work, and besides the actual formal- 

ization process there also is the task of maintaining the repository. Thus 
it seems obvious to keep a repsoitory as small as possible, in particular 
each piece of mathematical knowledge should be formalized only once. 
(__j ■ In this paper, however, we claim that it might be reasonable or even nec- 

^2 ' essary to duplicate knowledge in a mathematical repository. We analyze 

O . different situations and reasons for doing so and provide a number of 

examples supporting our thesis. 
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Mathematical knowledge management aims at providing both tools and infra- 
structure supporting the organization, development, and also teaching of math- 
ematics using modern techniques provided by computers. Consequently, large 
repositories of mathematical knowledge are of major interest because they pro- 
vide users with a data base of — verified — mathematical knowledge. We empha- 
size the fact that a repository should contain verified knowledge only together 
with the corresponding proofs. We believe that (machine- checked or -checkable) 
proofs necessarily belong to each theorem and therefore are an essential part of 



However, mathematical repositories should be more than collections of the- 
orems and their proofs accomplished by a prover or proof checker. The overall 
goal here is not only stating and proving a theorem — though this remains 
an important and challenging part — but also presenting definitions and theo- 
rems so that the "natural" mathematical buildup remains visible. Theories and 
their interconnections should be available, so that the further development of 
the repository can be based upon these. Being not trivial as such, this becomes 
even harder to assure for an open repository with a large number of authors. 

In this paper we deal with yet another organizational aspect of building math- 
ematical repositories: the duplication of knowledge, by which we mean that a 
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repository includes redundant knowledge. At first glance this may look inaccept- 
able or at least unnecessary. Why should one include — and hence formalize — 
the same thing more than once? A closer inspection, however, shows that math- 
ematical redundance may occur in different non-trivial facets: Different proofs 
of a theorem may exist or different versions of a theorem formulated in a dif- 
ferent context. Sometimes we even have different representations of the same 
mathematical object serving for different purposes. 

From the mathematical point of view this is not only harmless but also de- 
sirable; it is part of the mathematical progress that theorems and definitions 
change and evolve. In mathematical repositories, however, each duplication of 
knowledge causes an additional amount of work. In this paper we analyze miscel- 
lanous situations and reasons why there could — and should — be at least some 
redundance in mathematical repositories. These situations range from the above 
mentioned duplication of proofs, theorems and representations to the problem of 
generalizing knowledge. Even techical reasons due to the progress of a repository 
may lead to duplication of knowledge. 

2 Different Proofs of a Theorem 

The Chinese Remainder Theorem is a result about congruences over the integers. 
It states that an integer u can be completely described by the sequence of its 
remainders — if the number of remainders is big enough. The "standard" version 
of the theorem reads as follows. 

Theorem 1. Let mi, 777,2, . . . , m r be positive integers such that rrii and rrij 
are relatively prime for i ^ j. Let m = m\mi ■ ■ ■ m r and let u\, U2, ■ ■ ■ ,u r be 
integers. Then there exists exactly one integer u with 

< u < m and u = Ui mod rrii for all I < i < r. o 

In the following we present three different proofs of the theorem and discuss 
their relevance to be included in mathematical repositories. It is very easy to 
show, that there exists at most one such integer u; in the following proofs we 
therefore focus on proving the existence of u. The proofs are taken from [Knu97]. 

First proof: Suppose integer u runs through the m values < u < m. Then 
(u mod mi, . . . ,u mod m r ) also runs through m different values, because the 
system of congruences has at most one solution. Because there are exactly 
mim2 • ■ • rn r = m different tuples (v±, . . . , v r ) with < Vi < rrii, every tuple oc- 
curs exactly once, and hence for one of those we have (u mod mi, . . . , u mod m r ) 

= (ui,...,U r ). O 

This proof is pretty elegant and uses a rather obvious variant of the pigeon 
hole principle: If we pack m items without repetition to m buckets, then we must 
have exactly one item in each bucket. It is therefore valuable to include this proof 
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in a repository for didactic or aesthetic reasons. On the other hand, formaliza- 
tion of the proof is not necessarilly straightforward. One has to argue about the 
number of different r-tuples and, more importantly, to show that there exists a 
bijccton between the set of r-tuplcs and the non-negative integers smaller than 
m. Another disadvantage is that the proof is non-constructive, so that it gives 
no hints to find the value of u — besides the rather valueless "Try and check all 
possibilities, one will fit" . This is even more disturbing, because a constructive 
proof can easily be given: 

Second proof: We can find integers Mj for 1 < i < r with 

Mi = 1 mod rrii and Mj = mod m, for j =/= i. 
Because mi and m/m,i are relatively prime, we can take for example 

Mi = (m/mO v(roi) , 
where tfi denotes the Euler function. Now, 

u = [u\M\ + U2M2 + ■ ■ ■ + u r M r ) mod m 
has the desired properties, o 

This proof constructs r constants Mj with which the sought-after u can easily 
be computed, ft therefore, in some sense, contains more information than the 
first proof, that should be contained in the repository also. The proof uses far 
more evolved mathematical notations — namely Euler's function — and for that 
reason may also be considered more interesting than the first one. Formalization 
requires the use of Euler's function 3 which may cause some preliminary work. 
From a computer science point of view the proof has two disadvantages. First, 
it is not easy to compute Euler's function; in general one has to decompose the 
moduli mi into their prime factors. Second, the Mj being multiples of m/rrii are 
really big numbers, so that a better method for computing u is highly desirable. 
Such a method has indeed been found by H. Garner, which gives a third proof 
of Theorem f : 

Third proof: Because we have gcd(mj,mj) = f for i ^ j we can find integers 
Cij for f < i < j < r with 

Cijirii = 1 mod rrij 

by applying the extended Euclidean algorithm to rm and rrij . Now taking 



vi 

V3 



= mi mod mi 

= (t*2 - Ui)ci2 mod to 2 

= ((«3 - Vi)ci3 - v 2 )c 23 mod m 3 

— (. . . ((u r — Vi)ci r — V2)cir — • • • — v r _i)c( r _i) r mod m r 



Actually a mild modification of the proof works without Euler's function. 
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and then setting 

U := V r m r -l ■ ■ ■ TO2TO1 + • • • + W3?Ti2Til + U2?Til + Vl 

we get the desired integer u. o 

The proof uses (V) constants c,j that can be computed with the extended 
Euclidean algorithm because we have gcd(rrii,mj) = 1 for i 7^ j. When con- 
structing the Vi the application of the modulo operation in each step ensures 
that the occurring values remain small. The proof is far more technical than 
the others in constructing (Q + r additional constants, the Vi in addition being 
recursively defined. On the other hand, however, this proof includes an efficient 
method to compute the integer u from Theorem 1. 

We see that the question which proof of a theorem should be formalized, 
does not only depend on the hardness of the formalization in a given system. 
Both elegance and the amount of information are issues that can be taken into 
consideration — this may even result in formalizing more than one proof. 

3 Different versions of Theorems 

There are quite a number of reasons why different versions of the same theorem 
exist and may be included in mathematical repositories. Besides mathematical 
issues we also identified reasons justified by formalization issues or the develop- 
ment of repositories itself. For illustration we again use the CRT as an example. 

3.1 Restricted Versions 

Theorems are not always shown with a proof assistant to be included in a repos- 
itory in the first place: Maybe the main goal is to illustrate or test a new im- 
plemented proof technique or just to show that this special kind of mathematics 
can be handled within the particular system. In this case it is often sufficient — 
or simply easier — to prove a weaker or restricted version of the original theorem 
from the literature. 

In Hoi Light [HarlO], for example, we find the following theorem. 

# INTEGER_RULE 

' !a b u v:int. coprime(a,b) ==> 

?x. (x == u) (mod a) A (x == v) (mod b) ' ; 

This is a version of the Chinese Remainder Theorem 1 stating that in case of two 
moduli a and b only there exists a simultaneous solution x of the congruences. 
Similar versions have been shown with hol98 ([Hur03]), the Coq proof assistant 
([Men 10]) or Rewrite Rule Laboratory ([ZH92]). 

From the viewpoint of mathematical repositories it is of course desirable to 
have included the full version of the theorem also. Can we, however, in this case 
easily set the restricted version aside? Note that the above theorem in Hoi Light 
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also serves as a rule for proving divisibility properties of the integers. Erasing 
the restricted version then means that the full version has to be used instead. 
It is hardly foreseeable whether this will work for all other proofs relying on 
the restricted version. So, probably sometimes both the restricted and the full 
version belong to the repsository. 

3.2 Different Mathematical Versions 

The most natural reason for different versions of theorems is that mathematicians 
often look at the same issue from different perspectives. The CRT presented in 
Section 2 deals with congruences over the integers: it states the existence of an 
integer solving a given system of congruences. Looking from a more algebraic 
point of view we see that the moduli rrii can be interpreted as describing the 
residue class rings Z mi . The existence and uniqueness of the integer u from the 
CRT then gives rise to an isomorphism between rings [GG99]: 

Theorem 2. Let mi, m 2 , . . . , m r be positive integers such that m, and rrij are 
relatively prime for i =/= j and let m = mi m^ ■ ■ • m r . Then we have the ring 
isomorphism 

&m = ^mn X • • • X J->rn r • ^ 

This version of the CRT has been formalized in hol98 [Hur03]. Here we find a 
two-moduli version that in addition is restricted to multiplicative groups. Tech- 
nically, the theorem states that for relative prime moduli p and q the function 
Xx.(x mod p, x mod q) is a group isomorphism between Z pq and Z p x Z q . 

h Vp, q. 

1 <p Al < q A gcd p q=l=> 
(Xx.(x mod p, x mod q)) £ 

groupjso (mult_group pq) 

(prod_group (mult_group p) (mult_group q)) 

Note that, in contrast to Theorem 2, the isomorphism is part of the theorem 
itself and not hidden in the proof. 

It is not easy to decide which version of the CRT may be better suited for 
inclusion in a mathematical repository. Theorem 2 looks more elegant and in 
some sense contains more information than Theorem 1: It does not state the ex- 
istence of a special integer, but the equality of two mathematical structures. The 
proof of Theorem 2 uses the homomorphism theorem for rings and is therefore 
interesting for didactic reasons, too. On the other hand, Theorem 1 uses inte- 
gers and congruences only, so that one needs less preliminaries to understand it. 
Theorem 1 and its proof also give more information than theorem 2 concerning 
computational issues 4 — at least if not the first proof only has been formalized. 



4 To apply the homomorphism theorem in the proof of Theorem 2 one needs to show 
that the canonical homomorphism is a surjection with kernel (m). This sometimes 
is done by employing the extended Euclidean algorithm, so that this proof gives an 
algorithm, too. 
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3.3 Different Technical Versions 

Another reason for different versions of a theorem may be originated in the 
mathematical repository itself. Here again open repositories play an important 
role: Different authors, hence different styles of formalizing and different kinds 
of mathematical understanding and preferences meet in one repository. So, it 
may happen that two authors formalize the same (mathematical) theorem, but 
choose a different formulation and/or a different proof. We call this technical 
versions. 

Especially in evolving systems such versions may radically differ just because 
the system's language improved over the years. In the Mizar Mathematical Li- 
brary, for example, we find the following CRT [Sch08] 

theorem 

for u being integer-yielding FinSequence, 
m being CR_Sequence st len u = len m 
ex z being Integer 

st <= z & z < Product (m) & for i being natural number 
st i in dom u holds z.u.i are_congruent_mod m.i; 

Here, a CR_Sequence is a sequence og natural numbers, which are pairwise 
relative prime. Note that the formulation f the CRT is very close to the texbook 
version theorem 1. 

In another Mizar article [Kon97], however, we find a different formulation of 
the CRT: 

theorem :: WSIERP_1:44 

len fp>=2 & 

(for b,c st b in dom fp & c in dom fp & bOc holds (fp.b gcd fp.c)=l) 

implies for fr st len fr=len fp holds ex frl st (len frl=len fp & 

for b st b in dom fp holds (fp.b)*(frl .b)+(fr .b)=(fp. 1) *(frl . l)+(fr . 1) ) ; 

In this version no attributes are used. The condition that the rrii are pairwise 
relatively prime is here stated explicitly using the gcd functor for natural num- 
bers. Also the congruences are described arithmetically: u = Ui mod rrii means 
that there exists a Xi such that u = Ui + Xi * m, , so the theorem basically states 
the existence of Xi , . . . , x r instead of u. 

Since the article has been written more than 10 years ago, a reason for this 
technical formulation is hard to find. It may be that at the time of writing 
Mizar's attribute mechanism was not so far developed as today, i.e. the author 
reformulated the theorem in order to get it formalized at all. Another explanation 
for this second technical version might be that the author when formalizing 
the CRT already had in mind a particular application and therefore chose a 
formulation better suited to prove the application. 

In the Coq Proof Assistant [CoqlO] the CRT has been proved for a bit vector 
representation of the integers [MenlO] , though as a restricted version of Theorem 
1 with two moduli a and b. 
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Theorem chinese_remaindering_theorem : 
forall a b x y : Z, 
gcdZ a b = l'/,Z -> {z : Z I congruentZ z x a /\ congruentZ z y b} . 

In fact this theorem and its proof are the result of rewriting a former proof of 
the CRT in Coq. So in Coq there exist two versions of the CRT — though the 
former one has been declared obsolete. 

We see that in general the way authors use open systems to formalize theo- 
rems has a crucial impact on the formulation, that is on the technical version of 
a theorem, and may lead to different versions of the same theorem. Removing 
one — usually the older — version is a dangerous task: In large repositories it is 
not clear whether all proofs relying on the deleted version can be easily changed 
to work with the other one. So often both versions reamin in the repository. 

4 Abstract and Concrete Mathematics 

Practically every mathematical repository has a notion of groups, rings, fields 
and many more abstract structures. The advantage is obvious: A theorem shown 
to hold in an abstract structure is also true in every concrete structure of this 
type. This can help to kepp a repository small: Even if concrete structures are 
defined there is no need to repeat theorems following from the abstract structure. 
If necessary in a proof for a concrete structure one can just use the theorem 
proved for the abstract structure. 

Nevertheless authors tend to prove theorems again for the concrete case. We 
can observe this phenomenon in the Mizar Mathematical Library (MML). There 
we find, for example, the following theorem about groups. 

theorem 

for V being Group 

for v being Element of V holds v - v = O.V; 

For a number of conrete groups (rings or fields) this theroem, however, has 
been proved and stored in MML again, among them complex numbers and poly- 
nomials. 

theorem 

for a being complex number holds a - a = 0; 

theorem 

for L be add-associative right_zeroed right_complementable 

(non empty addLoopStr) 
for p be Polynomial of L holds p - p = 0_ . (L) ; 

One reason might be that authors are not aware of the abstract theorems 
they can use and therefore think that it is necessary to include theorems in the 
concrete case. This might be especially true, if authors work on applications 
rather than on "core" mathematics. On the other hand it might just be more 
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comfortable for authors to work solely in the concrete structure rather than to 
switch between concrete and abstract structures while proving theorems in a 
concrete structure. 

Constructing new structures from already existent ones sometimes causes 
a similar problem: Shall we formalize a more concrete or a more abstract con- 
struction? Multivariate polynomials, for example, can be recursively constructed 
from univariate polynomials using R[X, Y] = (i?[X])[l"]; or more concrete as 
functions from Terms in X and Y into the ring R. Which version is better suited 
for mathematical repositories? Hard to say, from a mathematical point of view 
the first version is the more interesting construction. The second one, however, 
seems more intuitive and may be more convinient to apply in other areas where 
polynomials are used. So, it might be reasonable to include both constructions 
in a repository. In this case, however, theorems about polynomials will duplicate 
also. 

We close this section with another example: rational functions. Rational func- 
tions can be constructed as pairs of polynomials or as the completion K(X) of 
the polynomial ring i^[A"]. As in the case of multivariate polynomials both con- 
structions have its right in its own, so again both may be included in a repository. 
Note that this eventually might result in another (two) concrete version(s) of the 
theorem about groups from above, e.g. 

theorem 

for L being Field 

for z being Rational_Function of L 

holds z - [0_.(L),1_.(U] = z; 

5 Representational Issues 

In the majority of cases it does not play a major role how mathematical ob- 
jects are represented in repositories. Whether the real numbers, for example, are 
introduced axiomatically or are constructed as the Dedekind-completion of the 
rational numbers, has actually no influence on later formalizations using real 
numbers. Another example are ordered pairs: Here we can apply Kuratowski's 
or Wiener's definition that is 

{a,b) = {{a}}, {a, b}} 

or 

(a,fe) = {{{a},0},{{6}}} 

or even again the axiomatically approach 

(oi, &i) = (ci2, 62) if and only if a\ = 02 and b\ = 62- 

Once there is one of the notions included in a repository formalizations relying 
on this notion can be carried out more or less the same. 
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There are, however, mathematical objects having more than one interesting 
representation. The most prominent example are polynomials. Polynomials can 
be straightforwardly constructed as sequences (of coefficients) over a ring 

P = (o„,o„_i, ...do) 

or as functions from the natural numbers into a ring 

p = f : N — ► R where \{x\f(x) ^ 0}| < oo. 

Note that both representations explicitely mention all zero coefficients of a poly- 
nomial, that is provide a dense rcprcscntaion. 

There is an alternative seldom used in repositories: sparse polynomials. In 
this representation only coefficients not equal to are taken into account — at 
the cost that exponents have to be attached. We thus get a list of pairs: 

p = ((ei,Oi), (e 2 ,a 2 ), . . . (e m ,a m )). 

Though more technically to deal with — that probably being the reason for 
usually choosing a dense representation for formalization — there exist a number 
of efficient algorithms based on a sparse representation, for example interpolation 
and computation of integer roots. Therefore it seems reasonable to formalize 
both representations in a repository, thus reflecting the mathematical treatment 
of polynomials. 

Another example is the representation matrices, also a rather basic mathe- 
matical structure. The point here is that there exist many interesting subclasses 
of matrices, for example block matrices for which a particular multiplication al- 
gorithm can be given or triangular matrices for which equations are much easier 
to solve. Hence it might be reasonable to include different representations of 
matrices, that is different (re-) definitions, in a repository to provide support for 
particular applications of matrices. 

6 Generalization of Theorems 

Generalization of theorems is everyday occurrence in mathematics. In the case 
of mathematical repositories generalization is a rather involved topic: It is not 
obvious whether the less general theorem can be eliminated. Proofs of other 
theorems using the original version might not work automatically with the more 
general theorem instead. The reason may be that a slightly different formulation 
or even a different (mathematical or technical) version of the original theorem 
has been formalized. Then the question is: Should one rework all these proofs 
or keep both the original and the more general theorem in the repository? To 
illustrate that this decision is both not trivial and important for the organization 
of mathematical repositories we present in this section some generalizations of 
the CRT. 

A rather harmless generalization of Theorem 1 is based on the observation 
that the range in which the integer u lies, does not need to be fixed. It is sufficient 
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that it has the width to — m\mi ■ • ■ m r . This easily follows from the properties 
of the congruence =. 

Theorem 3. Let mi, mi, . . . , m r be positive integers such that rrii and rrij are 
relatively prime for i ^ j. Let m = mi mi • ■ • m r and let a, U\, ui, . . . , u r be 
integers. Then there exists exactly one integer u with 

a < u < a + to and u = Ui mod m^ 

for all 1 < i < r. o 

It is trivial that for a = we get the original Theorem 1. Old proofs can very 
easily be adapted to work with this generalization of the theorem. Maybe the 
system checking the repository even automatically infers that Theorem 3 with 
a = substitutes the original theorem. If not, however, even the easy changing 
all the proofs to work with the generalization can be an extensive, unpleasant, 
and time-consuming task. 

A second generalization of the CRT is concerned with the underlying alge- 
braic structure. The integers are the prototype example for Euclidean domains. 
Taking into account that the residue class ring Z n in fact is the factor ring of Z 
by the ideal nZ, it is rather obvious that the following generalization 5 holds. 

Theorem 4. Let R be a Euclidean domain. Let to-i, TO2, ...,m r be positive 
integers such that mi and rrij are relatively prime for i ^ j and let m — 
mi mi ■ ■ ■ m r . Then we have the ring isomorphism 

R/(m) = R/(m ) x • • • x R/(m r ). o 

This generalization may cause problems: In mathematical repositories it is an 
important difference whether one argues about the set of integers (with the usual 
operations) or the ring of integers: They have just different types. Technically, 
this means that in mathematical repositories we often have two different repre- 
sentations of the integers. In the mathematical setting theorems of course hold 
for both of them. However, proofs using one representation will not automati- 
cally work for the other one. Consequently, though Theorem 4 is more general, 
it will not work for proofs using integers instead of the ring of integers; for that 
a similar generalization of Theorem 1 is necessary. So in this case in order to 
make all proofs work with a generalization, we need to provide generalizations 
of different versions of the original theorem — or just change the proofs with the 
"right" representation leading to an unbalanced organization of the repository. 

We close this subsection with a generalization of the CRT that abstracts 
even from algebraic structures. The following theorem [Liin93] deals with sets 
and equivalence relations only and presents a condition whether the "canonical" 
function a is onto. 



5 Literally this is a generalization of Theorem 2, but of course Theorem 1 can be 
analogously generalized to Euclidean domains. 
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Theorem 5. Let a and j3 be equivalence relations on a given set M . Let a : 
M — > M/a x M//3 be defined by a(x) := (a(x), /3(x)). Then we have kcr(cr) = 
a n /3 and er is onto if and only ii ao f3 — M x M. o 

Here almost all of the familiar CRT gets lost. There are no congruences, no 
algebraic operations, only the factoring (of sets) remains. Therefore, it seems 
hardly possible to adapt proofs using any of the preceding CRTs to work with 
this generalization in a reasonable amount of time. Any application will rely 
on much more concrete structures, so that too much effort has to be spent to 
adapt a proof. Theorem 5 in some sense is too general to reasonably work with. 
However, even though hardly applicable, the theorem stays interesting from a 
didactic point of view. 6 It illustrates how far we sometimes can generalize and 
may provide the starting point of a discussion whether this is — aside from 
mathematical aesthetics — expedient; a topic that is also of great interest for 
the organization of mathematical repositories. 

7 Conclusions 

When building a mathematical repository it seems plausible to not duplicate 
knowledge in order to avoid an unnecessary blow-up of the repository. This is 
similar to — and may be inspired by — mathematical definitions, in which the 
number of axioms is kept as small as possible. 

In this paper we have argued that this, however, is not true in general. We 
have analyzed miscellanous situations in which it might be reasonable or even 
necessary to duplicate knowledge in a repository. The reasons for that are man- 
ifold: Different proofs may be interesting for didactic reasons or different repre- 
sentations of the same knowledge may better support different groups of users. 
Even improvements of a repository may lead to duplication of knowledge be- 
cause e.g. the improved version of a theorem cannot always be trivially erased 
without reworking lots of proofs. 

In general, it is hardly foreseeable in which cases which kind of knowledge 
should be duplicated. This strongly depends on different kind of users the repos- 
itory should attract. 
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